Remove the forced override of the context limit for Ollama API #2060 #2170

mcowger · 2025-08-27T20:17:49Z

Context

Setting this forcibly it can can force the Ollama server to perform an unexpected reload if its configured context is different from our built-in defaults. This ensures the new native-ollama behavior is the same as the previous behavior.

A followup PR will be added to allow this to be overridden in the UI.

Fixes: #2060

Thanks to jebba7151 for finding the root cause.

Implementation

remove num_ctx: modelInfo.contextWindow from client.chat() call.

Screenshots

NA

How to Test

Setup Ollama connection with model that has 16K context configured, but defaults to > 16K in Kilo
Send completion request.
Ensure model does not reload based on config change.

Get in Touch

mcowger on Discord.

… can force the Ollama server to perform an unexpected reload. This ensures the new native-ollama behavior is the same as the previous behavior. A followup PR will be added to allow this to be overridden in the UI.

changeset-bot · 2025-08-27T20:17:54Z

🦋 Changeset detected

Latest commit: d0efa75

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
kilo-code	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

chrarnoldus · 2025-08-27T20:36:13Z

This change causes Ollama to truncate prompts at 4096 tokens, which breaks Kilo Code completely.

time=2025-08-27T22:31:30.253+02:00 level=WARN source=runner.go:128 msg="truncating input prompt" limit=4096 prompt=9089 keep=4 new=4096

mcowger · 2025-08-28T16:26:01Z

This change causes Ollama to truncate prompts at 4096 tokens, which breaks Kilo Code completely.
time=2025-08-27T22:31:30.253+02:00 level=WARN source=runner.go:128 msg="truncating input prompt" limit=4096 prompt=9089 keep=4 new=4096

So I dont think it does, really.

Many Ollama models default to 4k (qwen3-0.6b my test example), even if they support more.

So if I run with this patch against a default config model like qwen3-0.6b, I get the same output because its configured for 4k:

❯ ollama show qwen3:0.6b
  Model
    architecture        qwen3
    parameters          751.63M
    context length      40960
    embedding length    1024
    quantization        Q4_K_M

But if I create a model file that pushes that up (or the model natively supports a longer context):

PARAMETER temperature 0.6
PARAMETER num_ctx 32768
PARAMETER top_k 20

It handles the request just fine.

So I dont think this changes breaks Kilo with Ollama, it just prevents Kilo from operating with models that have unacceptably small defaults.

FWIW, we'll also inherit this from Roo anyways: 7454

chrarnoldus · 2025-08-28T17:13:30Z

We do read the num_ctx from the model parameters:

kilocode/src/api/providers/fetchers/ollama.ts

Line 44 in 0be6743

    
           ? parseInt(rawModel.parameters.match(/^num_ctx\s+(\d+)/m)?.[1] ?? "", 10) || undefined

I just tested this in the release version with devstral with num_ctx set to 32k and it seems to work, so I'm not sure what problem you're trying to solve:

llama_context: n_ctx_per_seq (32000) < n_ctx_train (131072) -- the full capacity of the model will not be utilized

mcowger · 2025-08-28T17:40:53Z

We do read the num_ctx from the model parameters:

kilocode/src/api/providers/fetchers/ollama.ts

Line 44 in 0be6743

? parseInt(rawModel.parameters.match(/^num_ctx\s+(\d+)/m)?.[1] ?? "", 10) || undefined

I just tested this in the release version with devstral with num_ctx set to 32k and it seems to work, so I'm not sure what problem you're trying to solve:
llama_context: n_ctx_per_seq (32000) < n_ctx_train (131072) -- the full capacity of the model will not be utilized

The issue is the defaults, and Ollama's behavior when we specify a different value.

Model is loaded with a context windows value of 32768 (either as a default or manually overriden by user in Modelfile or CLI option)
User configures model in Kilo. We send explicitly send num_ctx: value, which is not correctly calculated (because Ollama uses multiple places with different values in its reporting). If we send, say "40960", this will force Ollama to reload the model with the new value, even if that value is incompatible with the users hardware (e.g. wont fit into VRAM, etc). Thus killing performance (and also causing an expensive model reload).

I've just pushed a new commit, that solves this a little better. When the handler is initialized, we interrogate the model info a little better, and use that to estimate if the completion request is going to fit. If not, we throw an error rather than setting num_ctx and forcing a model reload.

Take a look and let me know your thoughts.

chrarnoldus · 2025-08-28T17:49:51Z

Your latest commit works if you manually set PARAMETER num_ctx, but if you don't do that you still get a truncated prompt (and an infinite loop because of it). I don't think that's acceptable as a default experience.

mcowger · 2025-08-28T19:28:07Z

Your latest commit works if you manually set PARAMETER num_ctx, but if you don't do that you still get a truncated prompt (and an infinite loop because of it). I don't think that's acceptable as a default experience.

OK. I think its a tough spot

The current condition causes Ollama to reload and perform poorly or crash and ignore explicitly chosen user parameters on the Ollama side. Its invisible, not clear whats happening, or why the performance / reliability changes.
This solution, (which is already incoming from Roo anyways, though my update today makes it a little better) results in a clear error as to what went wrong, and why.

I'll defer to the Kilo team on your preference.

chrarnoldus · 2025-08-28T20:09:16Z

We would love to work on a solution together, it is clear the community is not happy with the current Ollama performance. But there does need to be a solution to the prompt truncation problem, because that is the first thing a new user will see and there is no clear error message in that case.

A followup PR will be added to allow this to be overridden in the UI.

How about implementing this proposal? I tried to implement it before (#1975), but had issues getting the value to sync properly on change (probably nothing insurmountable).

FWIW, we'll also inherit this from Roo anyways: RooCodeInc/Roo-Code#7454

That is a bot-generated PR and no guarantee on quality.

mcowger · 2025-08-29T01:17:49Z

there is no clear error message in that case

There is. With my updated commit, anything that exceeds the reported or expected limit of the model will have an explicit error thrown that the context is too long.

But I'll leave this to the Kilo team to solve in a way that works for you.

…his is what is uses in practice.

chrarnoldus · 2025-08-29T07:05:32Z

There is. With my updated commit, anything that exceeds the reported or expected limit of the model will have an explicit error thrown that the context is too long.

I tested your latest commit, but couldn't get the error to show up for a vanilla model. I added a commit that I think fixes it.

But I'll leave this to the Kilo team to solve in a way that works for you.

We don't use Ollama regularly so your input is very valuable. Please let me know what you think!

…nto mcowger/ollamaContext

chrarnoldus · 2025-08-29T08:03:41Z

apps/kilocode-docs/docs/providers/ollama.md

+## Preventing prompt truncation
+
+By default Ollama truncates prompts to a very short length.
+If you run into this problem, please see this FAQ item to resolve it:
+[How can I specify the context window size?](https://github.com/ollama/ollama/blob/4383a3ab7a075eff78b31f7dc84c747e2fcd22b8/docs/faq.md#how-can-i-specify-the-context-window-size)
+
+If you decide to use the `OLLAMA_CONTEXT_LENGTH` environment variable, it needs to be visible to both the IDE and the Ollama server.


This is the real change in this file, the rest is forced autformat.

mcowger · 2025-08-29T15:58:24Z

Nice, I like the use of the ENV var.

chrarnoldus · 2025-08-29T16:02:16Z

Thanks for your contribution @mcowger

Remove the forced override of the context limit for Ollama API Kilo-Org#2060

Add memoized model info, and checks against size.

75b9a47

Adjust contextWindow analysis to handle Ollama 40960 vs. 4096 goofiness

8ab7772

chrarnoldus added 2 commits August 29, 2025 08:58

Use 4096 when no num_ctx is set in Ollama model parameters, because t…

f18841a

…his is what is uses in practice.

Changeset

58987e3

chrarnoldus added 3 commits August 29, 2025 09:05

Merge branch 'main' into mcowger/ollamaContext

929d364

Update tests

fb28e41

Merge branch 'mcowger/ollamaContext' of github.com:mcowger/kilocode i…

d0efa75

…nto mcowger/ollamaContext

chrarnoldus reviewed Aug 29, 2025

View reviewed changes

chrarnoldus approved these changes Aug 29, 2025

View reviewed changes

chrarnoldus merged commit c509f12 into Kilo-Org:main Aug 29, 2025
11 checks passed

kilocode-bot mentioned this pull request Aug 29, 2025

Changeset version bump #2193

Merged

mcowger deleted the mcowger/ollamaContext branch October 27, 2025 18:29

suissa pushed a commit to suissa/neurohive-kilocode that referenced this pull request Oct 29, 2025

Merge pull request Kilo-Org#2170 from mcowger/mcowger/ollamaContext

7ec8170

Remove the forced override of the context limit for Ollama API Kilo-Org#2060

Remove the forced override of the context limit for Ollama API #2060 #2170

Remove the forced override of the context limit for Ollama API #2060 #2170

Uh oh!

Conversation

mcowger commented Aug 27, 2025

Context

Implementation

Screenshots

How to Test

Get in Touch

Uh oh!

changeset-bot bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

chrarnoldus commented Aug 27, 2025

Uh oh!

mcowger commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chrarnoldus commented Aug 28, 2025

Uh oh!

mcowger commented Aug 28, 2025

Uh oh!

chrarnoldus commented Aug 28, 2025

Uh oh!

mcowger commented Aug 28, 2025

Uh oh!

chrarnoldus commented Aug 28, 2025

Uh oh!

mcowger commented Aug 29, 2025

Uh oh!

chrarnoldus commented Aug 29, 2025

Uh oh!

chrarnoldus Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

mcowger commented Aug 29, 2025

Uh oh!

Uh oh!

chrarnoldus commented Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot bot commented Aug 27, 2025 •

edited

Loading

mcowger commented Aug 28, 2025 •

edited

Loading